Spatiotemporal Evolution of SARS-CoV-2 Alpha and Delta Variants during Large Nationwide Outbreak of COVID-19, Vietnam, 2021

We analyzed 1,303 SARS-CoV-2 whole-genome sequences from Vietnam, and found the Alpha and Delta variants were responsible for a large nationwide outbreak of COVID-19 in 2021. The Delta variant was confined to the AY.57 lineage and caused >1.7 million infections and >32,000 deaths. Viral transmission was strongly affected by nonpharmaceutical interventions.

accurate identification of cases of community transmission origin alongside the demographic data for analysis (4).

Nasopharyngeal Swabs and Sample Selection for Sequencing
The laboratories of National Hospital for Tropical Diseases (NHTD) in Hanoi, and the Hospital for Tropical Diseases (HTD) in Ho Chi Minh City, Vietnam were responsible for SARS-CoV-2 diagnosis and sequencing in Vietnam. Therefore, nasopharyngeal swab (NPS) samples submitted to NHTD and HTD laboratories for testing and sequencing were either from provincial CDC or from inpatients being treated at these two hospitals.
To increase the chance of successfully obtaining the virus genome, we first applied a preselection criterion based on the cycle threshold (Ct) value of the tested samples generated by the Lightmix Modular SARS-CoV-2 RdRp/E gene assay (Tib Molbiol, Berlin, Germany) (5).
This assay could detect both Alpha and Delta variants without compromising the sensitivity (6).
Accordingly, at NHTD, only NPS samples with a cycle threshold (Ct) value ≤30 for the RdRp gene were eligible, while at HTD, a sample with Ct value of ≤25 for the E gene was included.
Additionally, because of the availability of the resources, between January and June 2021 when community transmission remained limited, our approach focused on new community clusters detected through contact tracing under zero-COVID strategy. Between July and December 2021 during which community transmission was escalating, the selection of samples for sequencing was carried out by using WHO recommendations (7). Epidemiologic data, and data on infections and deaths were retrieved from e-hospital record or provided by the National Institute of Hygiene and Epidemiology and the Vietnamese Ministry of Health. Here, we focused our analysis on sequences obtained cases of locally acquired infection, and generated by NHTD or HTD laboratories, of which we had accurate sampling date and demographic data.
Our study formed part of the national COVID-19 response and was approved by the local

RNA Extraction, Whole-Genome Sequencing, and Sequence Assembly
Total RNA was extracted from NPS by using QIAamp viral RNA mini kit (Qiagen, Hilden, Germany) following the manufacturer's instructions, and finally eluted in 50μl of elution buffer (provided with the extraction kit). Whole genome amplification was performed on the extracted RNAby using either the long pooled amplicons protocol, developed by the University of Sydney (8,9), or the ARTIC V3 protocol (10) on an Illumina MiSeq platform as previously described. Library preparation was carried out by using the Nextera XT Library preparation kit (Illumina, USA), followed by library quantification by using KAPA Library Quant Kit (Kapa Biosystems, Wilmington, MA, USA), according to the manufacturer's instructions. The prepared library was sequenced by using iSeq reagent kit V2 (300 cycles) on a Miseq platform (Illumina).
For each run, tested samples were multiplexed and differentiated by double indexes by using IDT-ILMN Nextera DNA UD indexes (IDT).
Sequence assembly of the obtained sequencing data was carried out by using a referencebased mapping approach available in CLC genomics workbench (v.21.0.4) and Geneious 8.1.5 (Biomatters, San Francisco, CA, USA). This method involved mapping of sequencing output of individual samples to a reference genome (WuHan-Hu-1: NC_045512, Alpha: EPI_ISl_905782, Delta: EPI_ISL_1942165), followed by manual editing of the obtained consensus to ensure the accuracy of the results, as described previously (8). The consensus sequences generated in this study were submitted to the National Center for Biotechnology Information under the assigned accession numbers ON458864-ON459533, ON459545-ON459608, ON755375-ON755859, OQ415286-OQ415315, and OP647358-OP647411, and GISAID with assigned numbers:

Classifications of SARS-CoV-2 Variants
SARS-CoV-2 variant classification of the obtained consensuses was determined by using PANGO lineage (11) with pangolin v4.1.2 and pangolin-data v1.13 (12). The analysis was carried out by using the Ultrafast Sample Placement on Existing Trees option to assign the lineage based on the nearest lineage on existing global tree. Sequence alignment was carried out by using the tool available on Nextclade (13) and Minimap2 (14). Recombination detection was inferred by using a combination of Freyja v 1.3.10 (15) and sc2rf (16).

Maximum-Likelihood Framework to Study Genetic Relatedness of Alpha Variant Sequences
To explore the phylogenetic relationship of Vietnamese Alpha variant sequences obtained as part of the present study, we used a dataset consisting of the complete coding region (29,408 bp) of the obtained sequences and Alpha variant sequences randomly selected from those produced from the region and beyond (The U.S. and the UK) during the study period submitted GISIAD. We applied maximum likelihood (ML) method by using the TIM2 nt substitution model with invariant for Alpha variant as suggested by IQ tree (17). Support for individual nodes was assessed by using a bootstrap procedure with 1,000 replicates. To assess the placement of the Vietnamese variants in the context of global sequences, we used the phylogenetic framework incorporated in NextClade by using default setting (13), i.e., taking into account representatives of global sequences submitted to GISAID (Date of accession: 27 October 2022). Phylogenetic clusters were manually inspected by naked eyes.

Variant Sequences
In addition to applying PANGO lineage tool to classify the viral lineages of the Delta variant sequences as detailed above, we used maximum likelihood method and NextClade based phylogenetic analysis frameworks to assess the relatedness of the Delta sequences at national and global scale, respectively. For the former, we applied UNREST+F0+R4 as suggested by IQ tree (17). Support for individual nodes was assessed by using a bootstrap procedure with 1,000 replicates. For the latter, we used default setting as outline above.
To study the phylogeographics and phylodynamics of AY.57 lineage, the main lineage detected in Vientam in 2021, we applied Bayesian phylogenetic interference in BEAST v1.10.4 (18). We first excluded identical sequences and sequences of low quality (e.g., internal gaps).
We then used TempEst 1.5 to assess the temporal signal of the dataset (19). Subsequently, we excluded the sequences not conforming to a linear evolutionary pattern as suggested by To assess the effective population size trajectory, we applied Bayesian Skyride model by using the above framework. Finally, we used codon-based method (HyPhy) available in MEGA5 to measure the selection pressure on coding sequences of the pathogen genome, by estimating the ratio of nonsynonymous to synonymous substitution (dN/dS) at gene-wide level (25). The Hyphy used the joint maximum-likelihood reconstructions of ancestral states under a Muse-Gaut model of codon substitution.